Gene expression analysis of HIV patients with and without M. tuberculosis co-infection

Group 3: Anne Skov-Johannessen s184330, Dea F. Skipper s184324, Helene B. L. Petersen s194699, Johanne B. Overgaard s194691 and Rebecca C. Grenov s184344

Introduction

The leading cause of death in HIV-infected individuals.

  • Weakened immune system

  • Limits sensitivity of diagnosis of TB

Support vector machine to find 251-gene signature

  • Genes involved in Immunological, Infectious and Inflammatory Disease

Our aim:

  • Explore genes with a significant expression enriched in HIV with TB co-infection

  • Compare with the 251-gene signature found with the SVM model

Method

Keep it clean and tidy:

  • Select variables

  • Mutate variables

  • Handle key-variable

  • Handle replications

Methods

Methods

Normalization - minimize technical variability

Log transformation - stabilize variance, reduce skewness

Quantile Normalization:

  1. Sort the the expression levels for each patient.
  2. Calculate the mean expression level of the genes within the same rank.
  3. Assign this mean to each gene within this rank.
  4. Rearranging the genes for each patient to obtain the original order.

Methods

Methods

Results - PCA

Variance explained by the principal components

  • First PC explains 15% of the variance

  • 31 PCs needed to explain 90% of variance

Results - PCA

Scatter plot of projected observations onto PC1 and PC2

  • Slight division of disease state on PC1

  • No clear division of gender

Results - Linear Regression

Forest plot

  • Most significant genes are down regulated

Results - Linear Regression

Volcano plot

  • None of the significant genes are among the Tuberculosis signature